20 research outputs found

    The state of SQL-on-Hadoop in the cloud

    Get PDF
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Genotype-Phenotype Correlation in NF1: Evidence for a More Severe Phenotype Associated with Missense Mutations Affecting NF1 Codons 844–848

    Get PDF
    Neurofibromatosis type 1 (NF1), a common genetic disorder with a birth incidence of 1:2,000–3,000, is characterized by a highly variable clinical presentation. To date, only two clinically relevant intragenic genotype-phenotype correlations have been reported for NF1 missense mutations affecting p.Arg1809 and a single amino acid deletion p.Met922del. Both variants predispose to a distinct mild NF1 phenotype with neither externally visible cutaneous/plexiform neurofibromas nor other tumors. Here, we report 162 individuals (129 unrelated probands and 33 affected relatives) heterozygous for a constitutional missense mutation affecting one of five neighboring NF1 codons—Leu844, Cys845, Ala846, Leu847, and Gly848—located in the cysteine-serine-rich domain (CSRD). Collectively, these recurrent missense mutations affect ∌0.8% of unrelated NF1 mutation-positive probands in the University of Alabama at Birmingham (UAB) cohort. Major superficial plexiform neurofibromas and symptomatic spinal neurofibromas were more prevalent in these individuals compared with classic NF1-affected cohorts (both p < 0.0001). Nearly half of the individuals had symptomatic or asymptomatic optic pathway gliomas and/or skeletal abnormalities. Additionally, variants in this region seem to confer a high predisposition to develop malignancies compared with the general NF1-affected population (p = 0.0061). Our results demonstrate that these NF1 missense mutations, although located outside the GAP-related domain, may be an important risk factor for a severe presentation. A genotype-phenotype correlation at the NF1 region 844–848 exists and will be valuable in the management and genetic counseling of a significant number of individuals

    Extracting detailed metabolic information and connections from mammalian gut microbiomes via metaproteomics

    No full text
    A diverse community of bacteria populates the mammalian gastrointestinal tract. These populations exist in a balance with the host assisting with key functions, particularly metabolism of intractable fibers and immune modulation. Disruption of this balance can lead to diseases such as infection, inflammatory bowel syndrome, and obesity. Common symptoms include chronic pain, chronic inflammation, and altered metabolism. Several taxonomic classifications of bacteria have been associated with these diseases, but Recent studies have indicated that these finding are not always statistically valid. An explanation for this is that microbial communities between individuals and even across time can vary substantially even when the individuals have a similar health status. Microbial function, however, is a promising arena to study disease scenarios. Omics methods, which measure the entire gene content of a community are a particularly powerful set of techniques with which to analyze the potential and active function of microbiome communities. Metaproteomics detects and quantifies proteins directly from environmental samples and can be used to measure gut microbiome functional activity. This dissertation applied the use of LC-MS/MS based metaproteomics and metagenomic sequencing to study gut microbiome function in adult humans with Crohn’s disease, preterm infants with necrotizing enterocolitis, and obese and morphine treated mice. Intense variation across time and individuals was observed at the discrete protein sequence level; however, specific functions such as reactions and metabolic modules were shown to be more conserved. Fully connected metabolic networks and pathways were reconstructed from these metaproteomes, and specific metabolic functions are shown to be affected by necrotizing colitis, diet induced obesity, and morphine. This dissertation makes a major step forward by showing discrete metabolic reactions can be effectively analyzed using metaproteomic data

    Efficiently Updating Materialized Views

    No full text
    Query processing can be sped up by keeping frequently accessed users&apos; views materialized. However, the need to access base relations in response to queries can be avoided only if the materialized view is adequately maintained. We propose a method in which all database updates to base relations are first filtered to remove from consideration those that cannot possibly affect the view. The conditions given for the detection of updates of this type, called irrelevant updates, are necessary and sufficient and are independent of the database state. For the remaining database updates, a differential algorithm can be applied to re-evaluate the view expression. The algorithm proposed exploits the knowledge provided by both the view definition expression and the database update operations. 1 Introduction In a relational database system, a database may be composed of both base and derived relations. A de- This work was supported in part by scholarship No. 35957 from Consejo Nacional de Cien..

    The state of SQL-on-Hadoop in the cloud

    No full text
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer Reviewe
    corecore